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1.  INTRODUCTION 


CNN  is  a  hybrid  of  Cellular  Automata  and  Neural  Networks  (hence  the  name  Cellular  Neural 
Networks),  and  it  shares  the  best  features  of  both  worlds.  Like  Neural  Networks,  its  continuous  time 
feature  allows  real-time  signal  processing,  and  like  Cellular  Automata,  its  local  interconnection 
feature  makes  VLSI  realization  feasible.  Its  grid-like  structure  is  suitable  for  the  solution  of  a  high 
order  system  of  first  order  non-linear  differential  equations  on-line  and  in  real-time  .  CNN  is  an 
analog  nonlinear  dynamic  processor  array,  see  Fig.  la,  characterized  by  the  following  features  [2]: 


1)  Each  analog  processor  is  capable  of  processing  continuous  signals,  in  either  continuos-time  or 
discrete-time  modes. 

2)  The  processors  are  placed  on  a  3D  geometric  cellular  grid  (several  2D  layers)  and  are  basically 
identical. 

3)  Interaction  among  processors  is  local  and  mainly  translation  invariant. 

4)  The  mode  of  operation  may  be  transient,  equilibrium,  periodic,  chaotic,  or  combined  with  logic 
(without  A/D  conversion). 


(b) 


Fig.  1.  CNN  Structure  and  block  diagram. 


The  basic  circuit  unit  of  CNN  is  called  a  cell  [3].  It  contains  linear  and  nonlinear  circuit  elements. 
Any  cell,  C(iJ),  is  connected  only  to  its  neighbor  cells,  i.e.  adjacent  cells  interact  directly  with  each 
other.  This  intuitive  concept  is  called  neighborhood  and  is  denoted  as  N(iJ).  Cells  not  in  the 
immediate  neighborhood  have  indirect  effect  because  of  the  propagation  effects  of  the  dynamics  of 
the  network.  Each  cell  has  a  state  x,  input  u,  and  output  y.  The  state  of  each  cell  is  boimded  for  all 
time  t>0  and,  after  the  transient  has  settled  down,  a  cellular  neural  network  always  approaches  one 
of  its  stable  equilibrium  points.  This  last  fact  is  relevant  because  it  implies  that  the  circuit  will  not 
oscillate,  llie  dynamics  of  a  CNN  has  both  output  feedback  (A)  and  input  control  (B)  mechanisms. 
The  ffrst  order  nonlinear  differential  equation  defining  the  dynamics  of  a  cellular  neural  netwoik 
cell  can  be  written  as  follows 


C— ^  +  '^iij;k,l)  y^it)  «« 

c(jk./)eAr(iV)  C(W)eA^(iV) 

=  i(U,//)  +  II  -  \Xij{t)  -  II) 


(1) 


where  Xij  is  the  state  of  cell  C(iJ),  Xij(O)  is  the  initial  condition  of  the  cell,  C  is  a  linear  capacitor, 
R  is  a.  linear  resistor,  /  is  an  independent  current  source,  A(iJ;k,l)  yu  and  B(iJ;k,l)  iqa  are  voltage 
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controlled  current  sources  for  all  cells  C(k,l)  in  the  neighboiiiood  N(ij)  of  cell  C(ij),  and  yij 
represents  the  output  equation. 

Notice  from  the  summation  operators  that  each  cell  is  affected  by  its  neighbor  cells.  A( ■)  acts  on  the 
output  of  neighboring  cells  and  is  referred  to  as  ibt  feedback  operator.  B( •)  in  turn  affects  the  input 
control  and  is  referred  to  as  the  control  operator.  Specific  entry  values  of  matrices  A( ■)  and  )  are 
application  dependent,  are  space  invariant  and  are  called  cloning  templates.  A  current  bias  I  and  the 
cloning  templates  determine  the  transient  behavior  of  the  cellular  nonlinear  netwoilr.  Ibe  equivalent 
block  diagram  of  a  continuos-time  cell  implementation  is  shown  in  Fig.  lb. 

CNNs  have  as  input  a  set  of  analog  values  and  its  programmability  is  done  via  cloning  templates. 
Thus,  programmability  is  one  of  the  most  attractive  properties  of  CNNs,  but  how  to  choose  the 
optimal  network  and  how  to  program  it  to  perform  a  given  ta^  are  still  topics  under  investigation. 
This  is  the  reason  why  there  is  a  need  for  a  behavioral  CNN  simulator  capable  of  helping 
investigators  design  and  manipulate  cloning  templates  (“programming”).  Existent  tools  are  not 
meant  to  deal  with  a  significant  number  of  pixels  typical  in  common  image  processing 
applications[6].  The  simulator  presented  here  not  only  satisfies  this  need,  but  it  also  can  be  used  for 
testing  CNN  hardware  implementations. 

2.  BEHAVIORAL  SIMULATION 


Fig.  2  Raster  simulation  approach 

Recall  that  equation  (1)  is  space  invariant,  which  means  thd.tA(iJ;k,l)  =  A(i-k,j-l)  and  B(iJ;k,l)  = 
B(i-kJ-l)  for  all  ij,k,l.  Therefore,  the  solution  of  the  system  of  difference  equations  can  be  seen  as 
a  convolution  process  between  the  image  and  the  CNN  processors.  The  basic  approach  is  to  imagine 
a  square  subimage  area  centered  at  (x,y),  with  the  subimage  being  the  same  size  of  the  templates 
involved  in  the  simulation.  The  center  of  this  subimage  is  then  moved  from  pixel  to  pixel  starting, 
say,  at  the  top  left  comer  and  applying  the  A  and  B  templates  at  each  location  (x,y)  to  solve  the 
differential  equation,  see  Fig.  2.  This  procedure  is  repeated  for  each  time  step,  for  all  the  pixels.  An 
instance  of  this  image  scanning-processing  is  referred  to  as  an  “iteration”.  The  processing  stops 
when  it  is  found  that  the  states  of  all  CNN  processors  have  converged  to  steady-state  values  [3], 
and  the  ouq)uts  of  its  neighbor  cells  are  saturated,  e.g.  they  have  a  ±1  value. 
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This  whole  simulating  approach  is  referred  to  as  raster  simulation.  A  simplified  algorithm  is 
presented  below  for  this  approach.  The  part  where  the  integration  is  involved  (i.e.  calculation  of 
the  next  state)  is  explained  in  the  Numerical  Integration  Methods  section. 


Atgoritiun:  (^ngle-ixtyer  or  Raster  CNN  simulation) 

Obtain  the  iiqnit  image,  initial  conditions  and  templates  from  uso^; 

/*  M,N  s  #  of  rows/colunms  (tf  the  image*/ 
while  (conva:ged_cells  <  total  #  of  ceUs)  { 
for  (i=l;  i<=M;  i++) 
for  0=l;j<=N;j++)  { 

if  (convergence_flag[i]|j]) 

continue;  /*  current  cell  already  converged  */ 

/*  calculation  of  the  next  state*/ 

f(x(tn))  dt 
t, 

/*  convergence  criteria  */ 
dXjXtn) 

If  (— g-  =  0  and  yju  =  ±  1,  V  C(k,D  £  N^i,))  )  { 

convtfgence_flag[i]ij]  =  1; 
converged_cells++ ; 

} 

}/*endfor*/ 

/*  update  the  state  values  of  the  whole  image*/ 
for  (i=l;  i<«M;  i++) 
for(j=l;j<=N;j-H-)  { 

if  (convergence_flag[i]|j])  continue; 

} 

#_of_itaution++; 

I  /*  end  while  */ _ 


The  raster  approach  implies  that  each  pixel  is  mapped  onto  a  CNN  processor.  That  is,  we  have  an 
image  processing  function  in  the  spatial  domain  that  can  be  expressed  as; 

8(x,y)  =  T(J(x,y))  (2) 

where yf *)  is  the  input  image,  g( •)  the  processed  image,  and  Tis  an  operator  onyf •)  defined  over  the 
neighboriiood  of  (x,y).  From  hardware  implementation’s  point  of  view,  this  is  a  very  exhaustive 
i^proach.  For  practical  applications,  in  the  order  of  250,000  pixels,  the  hardware  would  require  an 
enormous  amount  of  processors  which  would  make  its  implementation  unfeasible.  An  alternative 
is  to  multiplex  the  image  processing  operator.  A  time-multiplexed  CNN  simulator  is  presented  in 
a  companion  paper  [1]. 
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3.  NUMERICAL  INTEGRATION  METHODS 


The  CNN  is  described  by  a  system  of  nonlinear  differential  equations.  Therefore,  it  is  necessary  to 
discretize  the  differential  equation  for  performing  behavioral  simulation.  For  computational 
purposes,  a  normalized  time  differential  equation  describing  CNN  is  used  [4]: 

dxJm)  „ 

f(M.nt))  =  J,.  =  -  Xiptt)  +  ^  yj^nr) 

+  ^  B(iJ;k,[)  Ua  +  I  (3) 

C(kJieN^ij) 

yiptx)  =  ^Qxiptt)  +  II  -  Lt,/nT)  -  II) 

where  X  is  the  normalized  time.  For  the  purpose  of  solving  the  initial-value  problem,  well 
established  Single-Step  methods  of  numerical  integration  techniques  are  used  [5].  These  methods 
can  be  derived  using  the  definition  of  the  definite  integral 

Xijiin  +  1)t)  -  Xipn)  =  I  f{xira))  d(nt)  (4) 

Three  of  the  most  widely  used  single-step  algorithms  are  used  in  the  CNN  behavioral  simulator 
discribed  here.  They  are  the  Euler’s  algorithm,  the  Improved  Euler  Predictor-Corrector  algorithm 
and  the  Fouith-Order  (quartic)  Runge-Kutta  algorithm.  These  methods  differ  in  the  way  they 
evaluate  the  integral  presented  in  (4). 

Euler’s  method  is  the  simplest  of  all  algorithms  for  solving  ODEs.  It  is  an  explicit  formula  which 
uses  the  Taylor-series  expansion  to  calculate  the  approximation 


Xi/n  +  1)t)  =  Xi/nr)  +  t  f{x{m))  (5) 

Tbe  Improved  Euler  Predictor-Corrector  method  uses  both  explicit  (predictor)  and  implicit 
(corrector)  formulae.  The  integral  is  calculated  by  multiplying  the  step  size  x  with  the  averaged  sum 
of  both  the  derivative  of  x(nx)  and  the  derivative  of  the  predicted  ;^((n+l)x)  at  the  next  time  step: 

Xifin  +  1)t)  =  Xifrtt)  +  |  \fiMnt))  +  f{Xp{{n  +  1)t))  (6) 


The  Fourth-Order  Runge-Kutta  method  is  the  most  costly  among  the  three  methods  in  terms  of 
computation  time,  as  it  requires  four  derivative  evaluations  per  time  step.  However,  its  high  cost 
is  compensated  by  its  accuracy  in  transient  behavior  analysis. 


x^iin  +  1)t)  =  Xiptt)  + 


ity  +  nel  +  2itv  + 

1  Z  3  4 
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(7) 


where 
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*  T  fix^m)) 

=  T  f(Xipn)  + 
=  r  fiXifra)  + 
k*^  =  T  fiXiptt)  + 


V 

C(ljn)eN^ij) 

V 

CMGN^ij) 

V 

C(l^)eNXij) 


wheieyr  )  is  computed  according  to  (1).  There  ate  many  single-step  methods  available  to  us  for  this 
purpose.  But,  one  option  wordi  considering  is  the  combination  of  two  methods  in  solving  for  the 
solution.  Since  the  fourth-order  Runge-Kutta  is  among  the  most  widely  used  single-step  method 
for  starting  the  solution  of  the  initial-value  problem  in  ODEs,  the  Predictor-Corrector  method  for 
continuing  the  solution  can  be  combined  with  the  Runge-Kutta  starter  to  make  a  very  efficient 
computer  simulation  method  for  solving  the  problem. 


4.  RASTER  SIMULATION  RESULTS  AND  COMPARISONS 

All  the  simulations  reported  here  are  performed  using  a  Sun  SPARC2  workstation,  and  the 
simulation  time  used  for  comparisons  is  the  actual  CPU  time  used.  The  input  image  format  for  this 
simulator  is  the  X  windows  bitmap  format  (xbm),  which  is  commonly  available  and  easily 
convertible  from  popular  image  formats  like  GIF  or  JPEG. 


(a)  (b) 

Fig.  3.  Image  processing,  (a)  After  Averaging  Template 
(b)  After  Averaging  and  Edge  Detection  Templates 


Fig.  3  shows  results  of  the  raster  simulator  obtained  from  a  complex  image  of  125,535  pixels.  For 
this  example  an  Averaging  template  followed  by  an  Edge  Detection  template  were  applied  to  the 
original  image  to  yield  the  images  displayed  in  Figs.  3a  and  3b,  respectively. 
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Since  speed  is  one  of  the  main  concerns  in  the  simulation,  finding  the  maximum  step  size  that  still 
yields  convergence  for  a  template  can  be  helpful  in  speeding  up  the  system.  The  speed-up  can  be 
achieved  by  selecting  an  appropriate  At  for  that  particular  template.  Even  though  the  maximum  step 
size  may  slightly  vary  from  one  image  to  another,  the  values  in  Fig.4  still  serve  as  good  references. 
These  results  were  obtained  by  trial  and  error  over  more  than  100  simulations  on  a  diamond  figure. 


Edge  Detection  Avo^aging  Connected  Comp 


Fig.  4.  Maximum  step  size  that  still  yields  convergence  for  3  different  templates 

The  importance  of  selecting  an  appropriate  At  can  be  easily  visualized  in  Fig.  5.  If  the  step  size 
chosen  is  too  small,  it  might  take  many  iterations,  hence  longer  time,  to  achieve  convergence.  On 
the  other  hand,  if  the  step  size  taken  is  too  large,  it  might  not  converge  at  all  or  it  would  converge 
to  erroneous  steady  state  values;  the  latter  remark  can  be  observed  for  the  Euler  integration  method 
The  results  of  Fig.  5  were  obtained  by  simulating  a  small  image  of  size  16x16  (256  pixels)  using 
an  Edge  detection  template  on  a  diamond  figure,  hi  Fig.  6,  simulation  time  computations  using  an 
Averaging  template  for  images  of  sizes  to  about  250,000  pixels  are  shown. 


Fig.  5.  Iteration  &  simulation  time  comparisons 
template 


0  12  3 


Step  Size  (At) 

of  the  three  metiiods  using  the  Edge  Detection 
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Simultion  time  (log  sec) 


Number  of  Pixels  (10^) 


Fig.  6.  Simulation  time  for  images  of  sizes  ranging  from  64x64  (4096  pixels)  to  415x603 
(2S024S  pixels)  using  Averaging  template 

5.  TIME  MULTIPLEXING  SIMULATION 

Under  this  approach  one  can  define  a  block  of  CNN  processors  which  will  process  a  subimage  whose 
number  of  pixels  is  equal  to  the  number  of  CNN  processors  in  the  block.  The  processing  within  this 
subimage  follows  the  raster  approach  described  in  the  companion  paper  [2].  Once  convergence  is 
achieved,  a  new  subimage  is  processed.  This  procedure  is  repeated  until  the  whole  image  has  been 
scanned.  It  is  obvious  that  with  this  approach  the  hardware  implementation  becomes  feasible  since 
now  the  number  of  CNN  processors  is  finite.  Also,  the  entire  image  is  scanned  only  once  since  each 
block  is  allowed  to  fully  converge. 

Even  though  the  approach  seems  tempting,  an  important  observation  is  necessary:  The  processed 
border  pixels  in  each  subimage  may  have  incorrect  values  since  they  are  processed  without 
neighboring  information.  Fortunately,  the  latency  of  CNNs  is  such  that  only  local  interactions  are 
important  Hence,  to  cope  with  the  previous  problem,  two  sufficient  conditions  must  be  considered 
while  doing  time-multiplexing  simulation.  In  other  words,  to  ensure  that  each  border  cell  properly 
interacts  with  its  neighbors  it  is  necessary:  1)  to  have  a  belt  of  pixels  from  the  original  image  around 
the  subimage,  and  2)  to  have  pixel  overlaps  between  adjacent  subimages. 

It  is  possible  to  quantize  the  processing  error  of  any  border  cell  Qj  with  neighborhood  radius  of  1. 
Let  tis  compute  independently  the  error  due  to  the  feedforward  operator  and  then  due  to  interactions 
among  cells  for  two  horizontally  adjacent  processing  blocks.  The  absolute  processing  error  due  only 
to  the  effect  of  the  B  template  is  obtained  by  subtracting  the  erroneous  state  value  from  die  error  free 
states  using  eq  1.  This  yields, 

i-3 

^fj  =  (8) 

where  bij+ /  are  the  missing  entries  from  the  B  template  due  to  the  absence  of  input  signals  Uij+ j  and 
sign(- )  is  the  sign  function.  The  latter  function  is  used  to  represent  the  status  of  a  pixel,  e.g.  black 
s  1  and  white  =-l.  Notice  that  the  error  is  both  image  and  template  dependent  In  other  words,  the 
steady  state  of  a  border  cell  may  converge  to  an  incorrect  value  due  to  the  absence  of  it’s  neighbors 
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weighted  input  Given  the  local  interconnectivity  properties  of  CNN,  one  can  conclude  that  the 
minimum  width  of  the  input  belt  of  pixels  is  equal  to  the  neighborhood  radius  of  the  CNN. 

Let  us  study  now  the  interactions  among  cells.  For  this  effect  we  can  compute  the  absolute  error 
in  a  similar  form.  Disregarding  for  the  moment  the  B  template  this  error  is 

i-3 

where  aij^j  are  the  missing  entries  from  the  A  template  due  to  the  absence  of  weighted  output 
signals  The  problem  in  this  situation  is  more  involved  because  the  output  signals  depend 

on  the  state  of  their  corresponding  cells.  To  minimize  the  error  an  overlap  of  pixels  between  two 
adjacent  blocks  is  proposed.  The  minimum  overlap  width  must  be  proportional  to  2x  the 
neighborhood’s  radius  of  the  CNN. 

The  general  time-multiplexing  procedure  consists  in  iterating  each  block  (subimage)  until  all  CNN 
cells  within  the  block  converge.  The  block  with  converged  cells  will  have  state  variables  x  which 
are  the  values  used  for  the  final  output  image.  In  the  overlapping  procedine  the  left  side  of  the 
overlapped  cells  take  converged  values  from  Blocki  and  the  right  side  from  Blocki+ j ,  see  Fig.  7.  In 
our  simulator  the  number  of  overlrqrping  columns  or  rows  between  the  adjacent  blocks  is  defined 
by  the  user.  Even  though  higher  number  of  overlapping  columns  or  rows  means  more  accurate 
simulation  of  neighboring  effects  on  the  border  cells,  for  applications  where  the  correct  final  state 
is  of  more  importance  than  the  transient  states,  an  overlap  of  two  is  usually  sufficient  An  even 
number  overlapping  of  overlapping  cells  is  reconunended,  since  the  converged  cells  in  the 
overlapped  region  can  be  evenly  divided  by  the  two  adjacent  blocks. 


8x8CNN  block 


Fig.  7.  CNN  multiplexing  with  overlapped  cells 

With  the  added  overlapping  feature,  better  neighboring  interactions  are  achieved,  but  at  the  same 
time,  an  increase  in  computation  time  is  inevitable.  However,  by  taking  advantage  of  the  fact  that 
the  original  input  image  is  been  divided  into  small  CNN  subimages,  the  chance  of  a  subimage  having 
all  its  pixels  black  or  white  is  high.  This  is  another  feature  that  can  be  added  to  the  time-multiplexing 
simulation  to  improve  computation  times.  The  savings  in  simulation  time  come  from  avoiding 
repetitive  simulations  of  all-black  and  all-white  subimages. 

The  idea  behind  this  time-saving  scheme  is  that  when  the  very  first  all-black/all-white  block  is 
encountered,  after  processing  that  block,  the  final  states  of  the  block  are  stored  separately  from  the 
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whole  image.  When  subsequent  all-black/all-white  blocks  are  found,  there  is  no  need  to  simulate 
these  blocks  since  the  conveiged  states  are  readily  available  in  memory,  thus  avoiding  the  most  time 
consuming  part  of  the  simulation  which  is  the  numerical  integration. 

For  the  purpose  of  better  understanding  the  overall  idea  of  this  simulation  approach,  the  simplified 
algorithm  is  presented  below: 

Algorithm:  (Ttme-Mul^lexing  CNN  simulation) 

ffl  =  (C^  =  l,...,block_;c  A  /  =  \,...,block_y}9  C  95  =  set  of  border  cells  (lower  left  comer) 
overlap  -  number  of  cell  overlaps: 

belt  =  width  of  input  belt  M  -  number  of  rows  of  the  image  N  =  number  of  columns  of  the  image 

for  (i=0;  i  <  M;  i  +=  block^x  -  ovali^) 
for  (j=0;  j  <  N;  j  +=  blocks  -  overUq>) 

{ 

/*  load  initial  conditions  for  the  cells  in  the  block  except  for  those  in  the  borders  V 
for  ^)B-bdt;  p  <  block^x  +  belt;  pt-i-) 
for  (q=-belt;  q  <  blockjr  +  belt;  q++)  { 

}  /*  end for  V 

/*  if  the  block  is  all  white  or  black  don’t 
process  it  */ 

if(*i+pj+q“”l  VXj+pj+^lV  €  95) 

( 

obtain  the  final  states  from  memory; 
continue; 

} 

do  { /♦  normal  raster  simulation  */ 
for  (p=0;  p  <  blodc^x;  pf+)  { 
for  (q=0;  q  <  blodcjr;  q++) 

{  /*  calculation  of  the  next  state  excluding  the  belt  of  inputs  V 

f’n+l 

•«(+«+,(<»+ 1)  =  +  j  dt  ^  e  ffl 

/*  convergence  criteria  */ 

it  (  =  0  and  =  ±  1  V  C(*,/)  G  NM  +  pj  +  g)  )  f 

conveiged_cells++ ; 

]/*endfor  */ 

/*  update  state  values  */ 

} 

}  while(  converged.cells  <  (blockji  *blockjr)); 

/*  store  new  state  values  excluding  the  ones  corresponding  to  the  border  cells  */ 
a  —  VCj,  G  95  \  g* 

]/*endfor  */ 
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6.  SIMULATION  RESULTS  AND  COMPARISONS 


The  general  features  of  the  raster  CNN  simulator  are  preserved  in  the  time-multiplexing  simulator, 
namely  the  choiceof  three  integration  methods,  the  format  of  the  input  image  file  and  the  capability 
of  processing  any  size  of  input  image.  Some  representative  simulation  results  due  to  the  effects  of 
the  added  features  in  the  time-multiplexing  simulator  are  presented  in  this  section. 

In  the  time-multiplexing  simulation  involving  the  time-saving  scheme,  the  number  of 
all-black/all-white  blocks  that  will  be  encountered  during  a  simulation  depends  on  the  image  itself 
and  the  block  size  chosen  by  the  user.  For  example.  Fig.  8a  evidently  will  benefit  from  this 
time-saving  scheme,  especially  from  not  needing  to  simulate  more  than  once  the  all-white  block. 


Using  actual  numbers  can  easily  show  how  much  improvement  is  achieved.  The  size  of  Fig.  8a  is 
395x403  ( 1 59, 1 85  pixels),  and  an  edge  detection  template  is  used  for  simulation  comparisons.  First, 
using  the  raster  simulator  presented  in  [2],  the  simulation  took  243.51  secs.  Next,  with  the  regular 
time-multiplexing  simulator  (with  overlapping  and  input  belt)  the  simulation  took  363.28  secs. 
Finally,  the  time-multiplexing  with  the  time-saving  scheme  performed  the  same  simulation  in 
244.22  secs,  almost  a  33%  improvement  from  the  tegular  time-multiplexing.  The  size  of  the  block 
used  was  10x10,  with  two  rows/columns  overlapping. 


Fig.  8.  'Ume-multiplexing  CNN  simulation,  (a)  Original  image,  (b)  After  Edge  Detection 

By  taking  the  lower  right-hand  comer  of  Fig.  8a,  the  cropped  image  in  Fig.  9a,  we  can  easily 
visualize  the  effects  that  the  overlapping  pixels  and  the  belt  of  inputs  have  on  the  simulation.  By 
choosing  the  block  size  to  be  10x10,  and  applying  an  edge  detection  template,  the  results  obtained 
by  simulating  the  image  without  overlapping  cells  and  belt  of  inputs  and  with  the  same  features 
added,  are  shown  in  Fig.  9b  and  Fig.  9c,  respectively.  A  clear  lost  of  neighboring  interaction  is 
shown  in  Fig.  9b,  which  is  recuperated  by  the  overlapping  and  belt  of  inputs,  shown  in  Fig.  9c. 
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(a)  (b)  (c) 

Fig.  3.  Time-multiplexing  simulation,  (a)  Original  image,  (b)  Without  overlapping  &  belt  of 
inputs,  (c)  With  overlapping  of  two  &  belt  of  inputs. 

One  interesting  aspect  to  be  conadeied  from  time  multiplexing  simulations  is  the  optimum  block 
size  of  the  CNN.  Fig.  10  displays  timing  simulation  results  versus  different  block  sizes.  Solid  line 
results  correspond  to  computations  that  were  carried  out  on  a  125,235  pixel-image  without 
black/white  blocks  and  using  an  Averaging  Template.  In  this  case,  it  can  be  observed  that  beyond 
a  block  size  of  50x50  no  noticeable  CPU  time  improvement  is  made.  The  results  presented  by  the 
dashed  line  correspond  to  computations  that  were  carried  out  using  an  Edge  Detection  template  on 
an  image  with  many  black/white  blocks.  For  smaller  block  sizes  many  black/white  blocks  were 
found  which  makes  the  computation  efficient  Beyond  block  sizes  of  80x80  no  more  black/white 
blocks  were  found.  This  is  why  this  is  the  peak  computation  time.  Fig.  1 1  shows  the  average  number 
of  iterations  within  each  CNN  block  and  for  the  whole  image  for  the  Averaging  Template  case.  These 
results  show  the  complexity  of  operations  that  must  be  considered  when  deciding  the  actual  CNN 
hardware  size. 


CPU  Time  (seconds)  Number  of  itoatioas 


0  30  60  90  120  0  20  40  60  80  100  120 


Size  <rf  CNN  block  (NxN)  Size  of  CNN  Block  (NxN) 

Rg.  10.  CPU  Time  vs.  CNN  Block  size  Fig.  11.  Number  of  iterations  vs.  CNN  Block  Size 

7.  CONCLUSION 

As  researchers  are  coming  up  with  more  and  more  CNN  applications,  an  efficient  and  powerful 
simulator  is  needed.  The  simulator  hereby  presented  meets  the  need  in  three  ways:  1)  Depending 
on  the  accuracy  required  for  the  simulation,  the  user  can  choose  from  three  popular  methods  to 
perform  the  numerical  integration,  2)  The  input  image  format  is  the  X  Windows  bitmap  (xbm). 
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which  is  commonly  available  and  3)  The  input  image  can  be  of  any  size,  allowing  simulation  of 
images  available  in  common  practices. 

While  keeping  the  features  of  the  raster  simulator,  the  time-multiplexing  simulator  presented  here 
process  the  image  block  by  block,  simulating  CNN  the  way  the  hardware  would,  if  the  number  of 
CNN  processors  of  the  hardware  is  smaller  than  the  input  image,  which  usually  it  is  the  case  with 
practical  size  images. 

With  the  overlapping  and  external  belt  of  inputs,  the  neighboring  interaction  between  CNN  blocks 
is  ensured,  but  at  the  same  time,  computation  costs  also  increased.  However,  with  the  added  feature 
of  processing  the  all-black  and  all-white  blocks  just  once  for  the  entire  simulation,  the  simulation 
time  is  brought  down  to  the  levels  of  raster  simulation,  if  not  better,  in  some  cases,  depending  on 
the  input  image  and  the  size  of  block  and  overlap  chosen. 
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